Skip to content

Add Anthropic Extended Thinking and Streaming Support#1

Open
damithsenanayake wants to merge 321 commits into
mainfrom
claude-thoughts-stream
Open

Add Anthropic Extended Thinking and Streaming Support#1
damithsenanayake wants to merge 321 commits into
mainfrom
claude-thoughts-stream

Conversation

@damithsenanayake
Copy link
Copy Markdown
Collaborator

@damithsenanayake damithsenanayake commented Oct 2, 2025

Summary

This PR adds support for Anthropic's Extended Thinking feature and Interleaved Thinking (beta) for Claude models in the ADK Python SDK. The implementation enables developers to use Claude's reasoning capabilities with thinking block signatures and multi-step tool reasoning, while maintaining full compatibility with existing code.

Key Features

Extended Thinking Support

  • Converts ADK ThinkingConfig to Anthropic API format
  • Supports disabled (0) and explicit (>0) budget values
  • Raises ValueError for unlimited budget (-1) as it's not supported by Claude
  • Works in both streaming and non-streaming modes
  • Parses thinking blocks as Part(thought=True, thought_signature=...) in standard GenAI format
  • Preserves cryptographic signatures from thinking blocks

Interleaved Thinking (Beta)

  • Enables Claude to reason between tool calls rather than thinking once upfront
  • Opt-in via enable_interleaved_thinking=True configuration
  • Sends anthropic-beta: interleaved-thinking-2025-05-14 header when enabled
  • Allows more sophisticated multi-step reasoning with tools
  • Thinking blocks automatically preserved in assistant message history

Improved Streaming

  • Migrated to AsyncAnthropicVertex for async streaming support
  • Implements thought suppression pattern to prevent UI fragmentation
  • Streaming controlled exclusively by stream parameter
  • Converts Anthropic streaming events to ADK LlmResponse format
  • Final message built from complete content blocks to preserve signatures

Backward Compatibility

  • 100% backward compatible - no breaking changes
  • All existing tests pass
  • Non-streaming mode supports thinking blocks
  • Thinking blocks ordering preserved (must come first per Anthropic API requirement)
  • Interleaved thinking disabled by default

Implementation Details

Core Changes

File: src/google/adk/models/anthropic_llm.py

  1. Import AsyncAnthropicVertex (line 33)

    • Added async client for streaming support
  2. Enhanced content_to_message_param() (lines 142-179)

    • Separates thinking blocks from other content blocks
    • Ensures thinking blocks come FIRST (Anthropic API requirement)
  3. Enhanced content_block_to_part() (lines 182-219)

    • Detects thinking blocks via thinking attribute or type='thinking'
    • Extracts and preserves signature (base64-encoded bytes)
    • Returns Part(thought=True, thought_signature=signature) for thinking content
    • Logs signature presence for debugging
  4. New streaming_event_to_llm_response() (lines 222-278)

    • Converts Anthropic streaming events to ADK format
    • Handles text_delta, thinking_delta, and message_delta events
    • Returns LlmResponse with proper partial=True flag
  5. Rewritten generate_content_async() (lines 366-508)

    • Extracts and converts thinking config from ADK to Anthropic format
    • Raises ValueError for budget=-1 (unlimited not supported)
    • Applies budget directly (no minimum enforcement - handled by API)
    • Configures beta header for interleaved thinking when enabled
    • Streaming controlled only by stream parameter
    • Accumulates thinking deltas and yields as single block (prevents UI fragmentation)
    • Builds final response from final_message.content to preserve signatures
    • Passes thinking and extra_headers parameters to both streaming and non-streaming API calls
  6. Added enable_interleaved_thinking field (line 360)

    • Boolean flag to enable interleaved thinking beta feature
    • Default: False (opt-in)
  7. Updated _anthropic_client property (lines 510-524)

    • Changed from AnthropicVertex to AsyncAnthropicVertex

Test Coverage

Test Files (50 tests total):

  1. test_anthropic_thinking.py (26 tests)

    • Budget validation (raises error on -1, accepts 0 and positive values)
    • Specific budget values (5000, 1024 minimum)
    • Thinking block parsing with signatures (base64-encoded)
    • Type-based thinking detection with signatures
    • No-config baseline testing
    • Streaming mode with thinking enabled
    • NEW: Interleaved thinking tests (10 tests):
      • Beta header NOT sent by default
      • Beta header sent in streaming mode when enabled
      • Beta header sent in non-streaming mode when enabled
      • Beta header only sent when thinking config is active
      • Thinking blocks preserved in assistant message history
  2. test_anthropic_streaming.py (8 tests)

    • Event conversion (text_delta, thinking_delta, usage_delta)
    • Start/stop event handling
    • Event-to-response transformation
  3. test_anthropic_llm.py (16 tests)

    • Existing tests continue to pass
    • No changes required

Code Quality

  • Style: Google Python Style Guide compliant
  • Formatting: Applied isort and pyink formatting
  • Documentation: Comprehensive inline comments explaining logic
  • Type Safety: Proper type hints and assertions
  • Signature Handling: Proper base64 encoding/decoding for cryptographic signatures

Testing Results

tests/unittests/models/test_anthropic_llm.py ............ 16/16 PASSED
tests/unittests/models/test_anthropic_thinking.py ...... 26/26 PASSED
tests/unittests/models/test_anthropic_streaming.py .....  8/8 PASSED

Total: 50/50 tests PASSED

No regressions - all existing tests continue to pass.

Breaking Changes

None. This PR is 100% backward compatible.

Usage Examples

Basic Extended Thinking

from google.genai import types
from google.adk.models.anthropic_llm import Claude
from google.adk.agents import Agent

# Enable extended thinking with explicit budget
agent = Agent(
    model=Claude(
        model="claude-opus-4-1@20250805",
        max_tokens=4096
    ),
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(
            include_thoughts=True,
            thinking_budget=2048  # Explicit budget (min 1024)
        )
    )
)

# Disable thinking
config=types.GenerateContentConfig(
    thinking_config=types.ThinkingConfig(
        include_thoughts=True,
        thinking_budget=0  # Disabled
    )
)

Interleaved Thinking with Tool Use

# Enable interleaved thinking for multi-step tool reasoning
agent = Agent(
    model=Claude(
        model="claude-opus-4-1@20250805",
        max_tokens=4096,
        enable_interleaved_thinking=True  # Beta: reason between tool calls
    ),
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(
            include_thoughts=True,
            thinking_budget=2048
        )
    ),
    tools=[calculator_tool, database_tool]
)

Accessing Thinking Blocks

# Streaming mode (explicit control)
async for response in model.generate_content_async(request, stream=True):
    if response.content:
        for part in response.content.parts:
            if part.thought:
                # Access thinking block with signature
                print(f"Thought: {part.text}")
                print(f"Signature: {part.thought_signature}")

Streaming Decision Logic

Streaming is controlled exclusively by the stream parameter:

  • stream=True: Uses streaming mode via messages.stream()
  • stream=False: Uses non-streaming mode via messages.create()

Both modes support thinking blocks when thinking parameter is provided.

Interleaved Thinking Details

When to Use:

  • Multi-step problems requiring reasoning between tool calls
  • Complex tool chains where intermediate results inform next steps
  • Scenarios where Claude needs to "think about" tool results before proceeding

How It Works:

  • Non-interleaved (default): Claude thinks once, makes all tool decisions upfront
  • Interleaved (beta): Claude can reason about tool results before deciding what to do next
  • Thinking blocks are automatically preserved when passing assistant messages back to API
  • Beta header only sent when BOTH enable_interleaved_thinking=True AND thinking is enabled

Requirements:

  • Claude 4 models (Opus 4, Sonnet 4, etc.)
  • Thinking config must be enabled (thinking_budget > 0)
  • Only supports automatic tool selection (tool_choice: auto)

Key Implementation Notes

  1. Signature Preservation: Thinking blocks now preserve cryptographic signatures from Anthropic API
  2. Thinking Block Ordering: Content blocks automatically reordered to place thinking blocks first (Anthropic requirement)
  3. Non-Streaming Thinking: Non-streaming mode correctly passes thinking parameter to API
  4. Budget Validation: Unlimited budget (-1) raises clear error message
  5. Final Message: Streaming builds final response from content blocks (not accumulated strings) to preserve signatures
  6. Beta Header: Conditionally set based on enable_interleaved_thinking flag and thinking config

@pandemosth
Copy link
Copy Markdown

Does this support tool use with interleaved thinking?
https://docs.claude.com/en/docs/build-with-claude/extended-thinking#tool-use-with-interleaved-thinking

Comment thread src/google/adk/models/anthropic_llm.py Outdated
budget = llm_request.config.thinking_config.thinking_budget
if budget:
if budget == -1:
# Automatic thinking budget - use recommended default of 10000 tokens
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Anthropic doesn't support automatic thinking budget we shouldn't fake it here. Throw an error instead as its important that the end user considers thinking budget for their use case as per https://docs.claude.com/en/docs/build-with-claude/extended-thinking#working-with-thinking-budgets

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done...

Comment thread src/google/adk/models/anthropic_llm.py Outdated
use_streaming = (
stream # From runtime context (streaming_mode == SSE)
or thinking
!= NOT_GIVEN # Extended thinking requires streaming (Anthropic-specific)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anthropic only supports extended thinking if streaming?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope... only for larger thinking budgets. now revised.

@damithsenanayake
Copy link
Copy Markdown
Collaborator Author

Does this support tool use with interleaved thinking? https://docs.claude.com/en/docs/build-with-claude/extended-thinking#tool-use-with-interleaved-thinking

not yet: on it

Jacksunwei and others added 26 commits October 9, 2025 11:24
… on tool confirmation requests

PiperOrigin-RevId: 817373403
… case

Agent developers can now create an eval set and add eval cases through command line itself. Adding an eval case is limited only to specifying conversation scenarios.

Sample comamnds:
- Create an eval set:
adk eval_set create \
    contributing/samples/hello_world \
    set_01

- Add an eval case with scenario file
Content of scenarios.json file:
'{"scenarios": [{"starting_prompt": "hello", "conversation_plan": "world"}]}'

adk eval_set add_eval_case \
    contributing/samples/hello_world \
    set_01 \
    --scenarios scenarios.json

PiperOrigin-RevId: 817456117
The added section provides details for the community call on Oct 15, 2025, including the agenda and links to join and add to calendars.

PiperOrigin-RevId: 817457276
This CL updates the "What's new" section to include Resumability, ReflectRetryToolPlugin, Context compaction, and Search tool support. It also moves "Agent Config" and "Tool Confirmation" from "What's new" to "Key Features".

PiperOrigin-RevId: 817469210
This change removes the `evaluate`, `_evaluate_row`, `are_tools_equal`, `_remove_tool_outputs`, `_report_failures`, and `_print_results` static methods from `TrajectoryEvaluator`, along with their corresponding unit tests. These methods were previously marked as deprecated.

PiperOrigin-RevId: 817477494
…ertexAiSearchTool

PiperOrigin-RevId: 817493869
Scores with values 0 (or 0.0) were also getting excluded.

PiperOrigin-RevId: 817658059
…ssue google#3131

changed the LiteLLM content conversion so Part.file_data.file_uri (like the gs://…) becomes a file object with file_id, making sure GCS-backed files reach LiteLLM proxies instead of being dropped add unit tests covering both _get_content and _content_to_message_param paths for file URIs

PiperOrigin-RevId: 817658432
…l whether it's in progress updates (thought) or the final response

PiperOrigin-RevId: 817682171
Directs agent to avoid deleting existing content

PiperOrigin-RevId: 817823999
… for each recommended change

PiperOrigin-RevId: 817831087
…earchTool and VertexAiSearchTool

PiperOrigin-RevId: 818053371
This change removes the `run_evals` function and its helper `_get_evaluator` from `cli_eval.py`, as they were marked as deprecated. Corresponding test mocks and patches in `test_fast_api.py` are also removed.

PiperOrigin-RevId: 818719422
This is so we don't need to worry about side effect of Loop in all agent type. Custom agent should do the same if there exists loop inside.

PiperOrigin-RevId: 818766305
PiperOrigin-RevId: 818781277
Update plugin manager and built-in plugins to prioritize CallbackContext. Keep InvocationContext access for legacy plugins with adapter. Change callback docs/tests to cover the new context.

PiperOrigin-RevId: 818798087
…onstructing a2a request

PiperOrigin-RevId: 818813897
Update plugin manager and built-in plugins to prioritize CallbackContext. Keep InvocationContext access for legacy plugins with adapter. Change callback docs/tests to cover the new context.

PiperOrigin-RevId: 818822267
…oogle#1683

- add a shared --structured_logs flag to adk web and adk api_server so users can opt into JSON-formatted output
- introduce CloudTraceJSONFormatter that emits structured entries and attaches current Cloud Trace/Span IDs when an OpenTelemetry context is active
- update CLI logging setup to clear duplicate stdout handlers when Cloud Logging is enabled and to reconfigure existing handlers (like from Uvicorn) so they also pick up the structured format and requested log level

With the flag disabled the CLIs keep their existing text logs; when enabled, the services now produce Cloud Logging–friendly JSON that can be correlated with distributed traces.

PiperOrigin-RevId: 818823818
GWeale and others added 30 commits November 4, 2025 09:44
LiteLLM providers can extract the MIME type from the data URI. Removing the separate `format` field avoids redundancy and potential issues with backends that may reject requests containing this field.

Close google#2017

Co-authored-by: George Weale <gweale@google.com>
PiperOrigin-RevId: 828014286
Co-authored-by: George Weale <gweale@google.com>
PiperOrigin-RevId: 828022792
…lbacks demo

Related: google#2292

Co-authored-by: Shan Cao <caoshan@google.com>
PiperOrigin-RevId: 828024955
This lets users to specify `drop_params` when initializing `LiteLlm`, which will be forwarded to LiteLLM's `acompletion` or `completion` calls

Close google#1718

Co-authored-by: George Weale <gweale@google.com>
PiperOrigin-RevId: 828058105
Co-authored-by: Yifan Wang <wanyif@google.com>
PiperOrigin-RevId: 828061540
…LLM usage

Closes google#3049

Co-authored-by: Eliza Huang <heliza@google.com>
PiperOrigin-RevId: 828091671
Populate the usage_metadata field for live events with the metadata provided by the Gemini live API.

Co-authored-by: Kathy Wu <wukathy@google.com>
PiperOrigin-RevId: 828124232
This change introduces a new section in the README.md to highlight the `adk-python-community` GitHub repository, describing it as a place for community-contributed tools and integrations.

Co-authored-by: Hangfei Lin <hangfei@google.com>
PiperOrigin-RevId: 828155205
Users were getting spammed with this log even though their tools didn't require authentication. To fix, reduce the log level to DEBUG so that it doesn't show up by default.

Co-authored-by: Kathy Wu <wukathy@google.com>
PiperOrigin-RevId: 828161281
Add support for MCP prompts via the McpInstructionProvider class, which can be specified as an agent's instruction.

Co-authored-by: Kathy Wu <wukathy@google.com>
PiperOrigin-RevId: 828166051
This change introduces BigQueryLoggerConfig to allow customization of the BigQueryAgentAnalyticsPlugin. Users can now enable/disable the plugin, specify event type allowlists and denylists, and provide a custom function to format or redact the content field before logging to BigQuery. The content logged for model and tool errors has also been enhanced.

PiperOrigin-RevId: 828172241
…Engine

Co-authored-by: Yeesian Ng <ysian@google.com>
PiperOrigin-RevId: 828178479
Merge google#2651

### Summary
Correct a misspelling in the build configuration:
- "swtich" → "switch" in `pyproject.toml`.

### Rationale
This is a spelling fix only. It improves readability and avoids potential confusion in configuration.
There is no impact on runtime behavior, tests, or public APIs.

### Notes
- Follows Conventional Commits style for build/config changes (`build:`).
- CLA status should be green via the Google CLA bot.

COPYBARA_INTEGRATE_REVIEW=google#2651 from marsboy02:docs/fix-type-pyproject b78c014
PiperOrigin-RevId: 828221776
Merge google#3381

### Link to Issue or Description of Change

**1. Link to an existing issue (if applicable):**

- Closes: google#3363
- This PR sets a max column width for the table printed in detailed output of agent evaluations.

**Problem:**
The detailed output of agent evaluations is not readable due to rows in the table getting wrapped. This happens when there are long text values in cells.

<img width="1904" height="717" alt="508807185-9e8fe1c3-d04a-43dd-acf9-0befaa1b247d" src="https://github.com/user-attachments/assets/61526ad2-8a9e-4c18-83e2-51a3b9b32d2b" />

**Solution:**
Existing code uses `tabulate` python package to format the table. We can set a maximum column width using `maxcolwidths` parameter. I have set it to `25`.

After the fix:
<img width="1882" height="711" alt="508810179-b91c5bca-fb43-480b-90ff-bca2e909417c" src="https://github.com/user-attachments/assets/b653f825-719e-4101-9acb-e28a52694cf8" />

### Testing Plan

I have manually tested if the output is properly displayed after changes. Please let me know if any unit tests can be added for this.

**Unit Tests:**

- [ ] I have added or updated unit tests for my change.
- [x] All unit tests pass locally.

<img width="1627" height="39" alt="image" src="https://github.com/user-attachments/assets/59a70619-3669-4113-8ab7-dcff130ee241" />

**Manual End-to-End (E2E) Tests:**

1. Create a simple agent using adk (preferably an agent that outputs a long text).
2. Create an evalset for this agent.
3. Run the evalset with `print_detailed_results` option and check if the output is properly displayed.

If you want a quick setup for testing this, I have a sample repo with an agent and an evalset [here](https://github.com/nimanthadilz/adk-test/tree/reproduce-print-detailed-results). You will have to manually build & install the fixed adk version to test it.

### Checklist

- [x] I have read the [CONTRIBUTING.md](https://github.com/google/adk-python/blob/main/CONTRIBUTING.md) document.
- [x] I have performed a self-review of my own code.
- [x] I have commented my code, particularly in hard-to-understand areas.
- [ ] I have added tests that prove my fix is effective or that my feature works.
- [x] New and existing unit tests pass locally with my changes.
- [x] I have manually tested my changes end-to-end.
- [x] Any dependent changes have been merged and published in downstream modules.

COPYBARA_INTEGRATE_REVIEW=google#3381 from nimanthadilz:fix-eval-output-rows-wrapping-issue f6d4012
PiperOrigin-RevId: 828265715
Co-authored-by: Xuan Yang <xygoogle@google.com>
PiperOrigin-RevId: 828313025
Co-authored-by: Yeesian Ng <ysian@google.com>
PiperOrigin-RevId: 828460860
The sample agent now uses updated model names for Gemini Live, including a new Vertex model as the default and a new AI Studio model option.

Co-authored-by: Hangfei Lin <hangfei@google.com>
PiperOrigin-RevId: 828515811
The blob content is often large and binary, which makes the logs unreadable and can cause excessive logging.

Co-authored-by: Hangfei Lin <hangfei@google.com>
PiperOrigin-RevId: 828523413
Co-authored-by: Xuan Yang <xygoogle@google.com>
PiperOrigin-RevId: 828533243
This allows state to be passing across agents

Co-authored-by: Shangjie Chen <deanchen@google.com>
PiperOrigin-RevId: 828557989
Co-authored-by: Xuan Yang <xygoogle@google.com>
PiperOrigin-RevId: 828560608
Merge google#3407

This PR corrects misspellings identified by the [check-spelling action](https://github.com/marketplace/actions/check-spelling)

Note: while I use tooling to identify errors, the tooling doesn't _actually_ provide the corrections, I'm picking them on my own. I'm a human, and I may make mistakes.

### Testing Plan

The misspellings have been reported at https://github.com/jsoref/adk-python/actions/runs/19056081305/attempts/1#summary-54426435973

The action reports that the changes in this PR would make it happy: https://github.com/jsoref/adk-python/actions/runs/19056081446/attempts/1#summary-54426436321

**Unit Tests:**

- [ ] I have added or updated unit tests for my change.
- [ ] All unit tests pass locally.

_Please include a summary of passed `pytest` results._

**Manual End-to-End (E2E) Tests:**

_Please provide instructions on how to manually test your changes, including any
necessary setup or configuration. Please provide logs or screenshots to help
reviewers better understand the fix._

### Checklist

- [x] I have read the [CONTRIBUTING.md](https://github.com/google/adk-python/blob/main/CONTRIBUTING.md) document.
- [x] I have performed a self-review of my own code.
- [ ] I have commented my code, particularly in hard-to-understand areas.
- [ ] I have added tests that prove my fix is effective or that my feature works.
- [ ] New and existing unit tests pass locally with my changes.
- [ ] I have manually tested my changes end-to-end.
- [ ] Any dependent changes have been merged and published in downstream modules.

### Additional context

- google#3382 (comment)

COPYBARA_INTEGRATE_REVIEW=google#3407 from jsoref:spelling-issue-template ce8febc
PiperOrigin-RevId: 828610865
Replace the full JSON schema dump with a compact text summary of key AgentConfig components like LlmAgent, ToolConfig, and GenerateContentConfig

Co-authored-by: George Weale <gweale@google.com>
PiperOrigin-RevId: 828627911
- add FileArtifactService that persists artifacts to the local filesystem
- adjust BaseArtifactService and exports so callers can wire in the filebacked implementation

Co-authored-by: George Weale <gweale@google.com>
PiperOrigin-RevId: 828629298
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.